Hi,
Im learning about CNNs right now and im working on a CNN classifier for binary classes. I came across a situation that i have difficulty explaining/justifying to myself...
In tutorials and reference books/websites, i always see a pattern of increasing the amount of filters after each pooling. For example:
Starting with a 100px X 100px image:
ConV1: (100, 100, 32)
MaxPool(50, 50, 32)
ConV2(50, 50, 64)
MaxPool(25, 25, 64)
ConV3(25, 25, 128)
MaxPool(12, 12, 128)
ConV3(12, 12, 256)
It is my understanding that we increase the amount of filters as the size of the matrices are reduced in other to catch patterns (descriptors) that are more and more abstract. I think i understand this well enough.
Here's the rub for me, i did some crude hyperparameter tuning on my model for fun and i obtained a much better accuracy on my test dataset with an architecture that varied between a high amount and a low amount of filters, such as:
ConV1: (100, 100, 64)
MaxPool(50, 50, 64)
ConV2(50, 50, 128)
MaxPool(25, 25, 128)
ConV3(25, 25, 32)
MaxPool(12, 12, 32)
ConV4(12, 12, 128)
MaxPool(6, 6, 128)
ConV5(6, 6, 8)
This architecture kind of remind me of the bottlenecking you'd see in an autoencoder? I know its not the same thing at all but is there a reason it works or is it a fluke?
In my case, im getting a 4% increase in accuracy on my testing dataset with the 2nd model. However, I cant find any papers about this type of architecture or why it actually worked as well as it did. Does anyone know of any explanation?
submitted by /u/Limiv0rous
[link] [comments]
( 51
min )